RBD analysis report

Published

March 10, 2025

Introduction

This is the WNP report for RBD. The report is interactive: click around on the plots and use the download buttons to export the tables/sequences for further analysis!

Want to know more about how the results were generated? Check the bottom of the report 👇

Top 100 most abundant sequences in each round

The trees in this section visualise the 100 most abundant clones in each round. Abundance is measured in counts per million (CPM) and so can be compared between rounds.

Round 1

using Gonnet

Round 2

using Gonnet

Enriched clusters

PCA plot & top 100s

Tree of the top 100 most enriched clusters

using Gonnet

Analysis method

Raw sequencing reads are first trimmed (to remove sequencing adapters) and merged (as two paired reads are required to cover the entire nanobody sequence) using the software TrimGalore and FLASH, respectively. Then, quality control of the run is performed with multiQC. To identify the important parts of the nanobody sequence (for example the germline genes and CDRs), IgBLAST is used with a custom alpaca reference (created using VDJ sequences deposited in the IMGT database).

Before analysis, reads that likely represent sequencing errors: those containing frameshifts, stop codons or having very low counts per million (CPM) are removed. Filtered nanobody sequences are then tracked through the panning process to determine their enrichment (measured as log2 fold change from Round 0, before panning to the end of the panning process, usually Round 2). Enriched nanobodies are then clustered in order to remove redundancy, and if there are a large number of clusters, a top 100 is chosen.

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#fbf0ed',
      'primaryTextColor': '#e83e8c',
      'primaryBorderColor': '#e83e8c',
      'lineColor': '#FF784F',
      'secondaryColor': '#006100',
      'tertiaryColor': '#fff'
    }
  }
}%%

flowchart TD
import["Raw sequencing read pairs"] --> trim_merge["Trim and merge raw reads"]
trim_merge --> seqQC["Quality control of sequencing run"]
seqQC --> annotate["Annotate nanobody germline genes, CDRs"]
annotate --> filtering["Filter to remove sequencing errors"]
filtering --> enrichment["Determine enrichment (log2 fold change) across panning rounds"]
enrichment --> clustering["Group similar nanobodies into clusters"]
clustering --> top_100["If required, narrow enriched clusters down to a top 100"]